Add methods for getting bytes + json to store abc#3638
Add methods for getting bytes + json to store abc#3638d-v-b merged 19 commits intozarr-developers:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3638 +/- ##
==========================================
+ Coverage 60.90% 60.99% +0.09%
==========================================
Files 86 86
Lines 10174 10244 +70
==========================================
+ Hits 6196 6248 +52
- Misses 3978 3996 +18
🚀 New features to boost your workflow:
|
|
What is the advantage of making these helpers part of the ABC, instead of helper functions or a new concrete Store? Is the intention that certain implementations will override these? If so, what would be the advantages of overriding? |
Putting this functionality on the ABC ensures that all stores implement it, without making per-store changes. Reading JSON is not actually new functionality, and ultimately I want to use these helpers internally (specifically, the async ones for reading JSON). I didn't do that in this PR because it requires changes to the
Yes, in this PR the localstore and memorystore implementations override these to allow The advantage in this case is to work around the cumbersome requirement that store methods take a |
| ... | ||
|
|
||
| async def get_bytes_async( | ||
| self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None |
There was a problem hiding this comment.
since the output of this method is a bytes object (CPU memory), I don't see the value of specifying a prototype in this method. can anyone suggest a situation where specifying the prototype would be useful here?
There was a problem hiding this comment.
I think I agree with you, that a prototype probably isn't needed. But I'd suggest including it for consistency with the other methods.
TomAugspurger
left a comment
There was a problem hiding this comment.
Since methods like get are asynchronous, I'd recommend making these convenience methods async-first and add a _sync suffix for the sync versions. get_bytes / get_bytes_async, etc.
| ... | ||
|
|
||
| async def get_bytes_async( | ||
| self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None |
There was a problem hiding this comment.
I think I agree with you, that a prototype probably isn't needed. But I'd suggest including it for consistency with the other methods.
src/zarr/abc/store.py
Outdated
| Examples | ||
| -------- | ||
| >>> store = MemoryStore() | ||
| >>> store.set("data", Buffer.from_bytes(b"hello world")) |
There was a problem hiding this comment.
I think this needs an await.
| ---------- | ||
| key : str, optional | ||
| The key identifying the data to retrieve. Defaults to an empty string. | ||
| prototype : BufferPrototype |
There was a problem hiding this comment.
We'll want to align this with what we do in the async version.
| >>> print(data) | ||
| b'hello' | ||
| """ | ||
| if prototype is None: |
There was a problem hiding this comment.
Can't this be done in the base class, and then all the child classes reuse the same implementation?
There was a problem hiding this comment.
I considered this direction. It would definitely be more convenient, but coupling the abstract store definition to the behavior of the default_buffter_prototype(), which in turn depends on the global config, seems much worse than other options (namely, baking buffer type into the store definition, or into store creation)
@jhamman had the same preference and I haven't heard anyone strongly support the current state of this PR, so I'll make this change |
|
|
|
in the dev meeting this week @maxrjones suggested making the new methods private API, so: Is that an accurate summary max? |
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
this was my off-hand reaction, but I am glad to defer to those with better API design instincts |
|
If it makes this PR more palatable, I'm happy making stuff private for now! |
|
the new methods are now all private. Since "read JSON" is indisputably a core behavior of a Zarr store, I think we should make these public sooner rather than later. |
|
As the methods are private, I removed the release notes, so I think this is ready for final review |
maxrjones
left a comment
There was a problem hiding this comment.
The prototype argument is awkward IMO, but seems worthwhile to merge this and try it out more to see whether that awkwardness matters.
maxrjones
left a comment
There was a problem hiding this comment.
The prototype argument is awkward IMO, but seems worthwhile to merge this and try it out more to see whether that awkwardness matters.
Agreed but that's inherited awkwardness -- we have to call |
Adds some convenience methods to the store abc:
get_bytes_asyncget_json_asyncget_bytesget_jsonThe async methods have a
_asyncsuffix.get_bytesandget_jsonare synchronous.this PR allows the following, which is a huge QOL improvement for people working with Zarr stores.